5.4.3 Cluster and robust estimation

The options robust and cluster() are used separately to specify whether one wants resp. robust or cluster estimation, and will as a result present regression estimates with adjusted standard deviations for the estimated coefficients. Associated t-, z- and p-values are also affected. Other values are not affected compared to standard estimation.

Note that robust and cluster can not be used in combination (cluster implies robust estimation).

Robust estimation can be used where there is a suspicion of problematic outliers or heteroskedasticity.

Cluster estimation is used when it is suspected that there are systematic dependencies within groups of observations, e.g. within schools or municipalities. The groups are specified through a variable (cluster variable) which is included in the parentheses of the cluster option, e.g. cluster(school) or cluster(municipality). The following conditions apply, otherwise the system will give an error message:

The number of groups must be of a certain size
The cluster variable must be numeric
The cluster variable cannot be included as a variable in the regression expression.

Examples:

regress income man married high_education, robust

regress income man married high_education, cluster(municipality)

Robust and cluster options can also be used on other regression types.